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^^ Abstract 

\C We consider the problem of efficient simulation estimation of the density function 

^~~^ at the tails, and the probability of large deviations for a sum of independent, iden- 

^_^ tically distributed, light-tailed and non-lattice random vectors. The latter problem 

py^ besides being of independent interest, also forms a building block for more com- 

n , plex rare event problems that arise, for instance, in queuing and financial credit risk 

• modeling. It has been extensively studied in literature where state independent ex- 

"^ ponential twisting based importance sampling has been shown to be asymptotically 

^ efficient and a more nuanced state dependent exponential twisting has been shown to 

^ have a stronger bounded relative error property. We exploit the saddle-point based 

representations that exist for these rare quantities, which rely on inverting the char- 
^v^ acteristic functions of the underlying random vectors. These representations reduce 

^ the rare event estimation problem to evaluating certain integrals, which may via im- 

C^^ portance sampling be represented as expectations. Further, it is easy to identify and 

approximate the zero- variance importance sampling distribution to estimate these in- 
tegrals. We identify such importance sampling measures and show that they possess 
the asymptotically vanishing relative error property that is stronger than the bounded 
^^ relative error property. To illustrate the broader applicability of the proposed method- 

ic ology, we extend it to similarly efficiently estimate the practically important expected 

I overshoot of sums of iid random variables. 
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1 Introduction 

Let {Xi : i > 1) denote a sequence of independent, identically distributed (iid) light tailed 
(their moment generating function is finite in a neighborhood of zero) non-lattice (modulus 
of their characteristic function is strictly less than one) random vectors taking values in 
SR*^, for d > 1. In this papeij^we consider the problem of efficient simulation estimation 
of the probability density function of Xn = - Y17=i -^i ^^ points away from EXi, and the 
tail probability P{Xn G A) for sets A that do not contain EXi and essentially are affine 
transformations of the non- negative orthant of K'^. We develop an efficient simulation 



^A very preliminary version of this paper appeared as |12) . 



estimation methodology for these rare quantities that exploits the well known saddle 
point representations for the probability density function of Xn obtained from Fourier 
inversion of the characteristic function of Xi (see e.g., [3], [9j and |21j). Furthermore, 
using Parseval's relation, similar representations for P{Xn G .4) are easily developed. To 
illustrate the broader applicability of the proposed methodology, we also develop similar 
representation for E{Xn '■ X^ > oljiii a single dimension setting {d = 1), for a > EXi, 
and using it develop an efficient simulation methodology for this quantity as well. 

The problem of efficient simulation estimation of the tail probability density function 
has not been studied in the literature, although, from practical viewpoint its clear that 
visual inspection of shape of such density functions provides a great deal of insight into 
the tail behavior of the sums of random variables. Another potential application maybe in 
the maximum likelihood framework for parameter estimation where closed form expres- 
sions for density functions of observed outputs are not available, but simulation based 
estimators provide an accurate proxy. The problem of efficiently estimating P(X„ S A) 
via importance sampling, besides being of independent importance, may also be consid- 
ered a building block for more complex problems involving many streams of i.i.d. random 
variables (see e.g., [23], for a queuing application; [16J for applications in credit risk mod- 
eling). This problem has been extensively studied in rare event simulation literature (see 
e.g., [5], [13], [15], [T7], [25], [26]). Essentially, the literature exploits the fact that the 
zero variance importance sampling estimator for P{Xn G A), though unimplementable, 
has a Markovian representation. This representation may be exploited to come up with 
provably efficient, implementable approximations (see [3] and [T9]). 

Sadowsky and Bucklew in [26j (also see ^0] ) developed exponential twisting based im- 
portance sampling algorithms to arrive at unbiased estimators for P{Xn £ A) that they 
proved were asymptotically or weakly efficient (as per the current standard terminology 
in rare event simulation literature, see e.g., [3] and [19] for an introduction to rare event 
simulation. Popular efficiency criteria for rare event estimators are also discussed later in 



Section 2.1). The importance sampling algorithms proposed by [26] were state indepen- 
dent in that each Xk+i was generated from a distribution independent of the previously 
generated {Xi : i < k). Blanchet, Leder and Glynn in f5| also considered the problem 
of estimating P{Xn £ A.) where they introduced state dependent, exponential twisting 
based importance sampling distributions (the distribution of generated X^^i depended 
on the previously generated {Xi : i < k)). They showed that, when done correctly, such 
an algorithm is strongly efficient, or equivalently has the bounded relative error property. 

The problem of efficient estimation of the expected overshoot E [{Xn — a) : Xn > a] 
is of considerable importance in finance and insurance settings. To the best of our knowl- 
edge, this is the ffist paper that directly tackles this estimation problem. 

As mentioned earlier, in this article we exploit the saddle point based representations of 
the rare event quantities considered. These representations allow us to write the quantity 
of interest a^ as a product Cn x /3„ where Cn ~ «-« (that is, Cnjctn — )• 1 as n — )• oo) and is 
known in closed form. So the problem of interest is estimation of /3„, which is an integral of 
a known function. Note that /3„ — )• 1 as n — ?■ oo. In the literature, asymptotic expansions 
for /3„ exist, however they require computation of third and higher order derivatives of the 
log-moment generating function of Xi. This is particularly difficult in higher dimensions. 
In addition, it is difficult to control the bias in such approximations. As we note later in 
numerical experiments, these biases can be significant even when probabilities are as small 
as of order 10~^. In the insurance and financial industry, simulation, with its associated 
variance reduction techniques, is the preferred method for tail risk measurement even 



Authors thank the editor for suggesting this application 



when asymptotic approximations are available (since these approximations are typically 
poor in the range of practical interest; see e.g., [T6]). 

In our analysis, we note that the integral /3„ can be expressed as an expectation of a 
random variable using importance sampling. Furthermore, the zero variance estimator for 
this expectation is easily ascertained. We approximate this estimator by an implementable 
importance sampling distribution and prove that the resulting unbiased estimator of an 
has the desirable asymptotically vanishing relative error property. More tangibly, the 
estimator of the integral /3„ has the property that its variance converges to zero as n — t- 
cxD. An additional advantage of the proposed approach over existing methodologies for 
estimating P{Xn £ •^) and related rare quantities is that while these methods require 
0{n) computational effort to generate each sample output, our approach per sample 
requires small and fixed effort independent of n. 

The use of saddle point methods to compute tail probabilities has a long and rich his- 
tory (see e.g., [Ij, [20] and [2I])- To the best of our knowledge the proposed methodology 
is the first attempt to combine the expanding literature on rare event simulation with the 
classical theory of saddle point approximations. 

The rest of the paper is organized as follows: In Section 2 we briefly review the popular 
performance evaluation measures used in rare event simulation, and the existing literature 
on estimating P{Xn S .4.). Then, in Section 3, we develop an importance sampling 
estimator for the density of X„ and show that it has asymptotically vanishing relative 
error. In Section 4, we devise an integral representation for P{Xn £ A) and develop an 
importance sampling estimator for it and again prove that it has asymptotically vanishing 
relative error. In this section we also discuss how this methodology can be adapted to 
similarly efficiently estimate E{Xn '■ Xn > a) in a single dimension setting. In Section 
5 we report the results of a few numerical experiments to support our analysis. We end 
with a brief conclusion and a discussion on some directions for future research in Section 
6. 

2 Rare event simulation, a brief review 

Let On = EnYn = / YndPn be a sequence of rare event expectations in the sense that 
a^ — 7" as n — 7- cxd, for non-negative random variables {Yn : n > 1). Here, En is the 
expectation operator under P„. For example, when a„ = P{Bn), Yn corresponds to the 
indicator of the event Bn- 

Naive simulation for estimating a„ requires generating many iid samples of Yn under 
P„. Their average then provides an unbiased estimator of an- Central limit theorem based 
approximations then provide an asymptotically valid confidence interval for an (under the 
assumption that EnY^ < oo). 

Importance sampling involves expressing an = f YnLndPn = En[YnLn], where Pn 
is another probability measure such that P„ is absolutely continuous w.r.t. Pn, with 
Ln = -rpr denoting the associated Radon-Nikodym derivative, or the likelihood ratio, and 

En is the expectation operator under Pn- The importance sampling unbiased estimator 
On of an is obtained by taking an average of generated iid samples of YnLn under P„. 
Note that by setting 

Yn_ 
EniYaY 



dPn = -^^i^dPn 



the simulation output YnLn is En(Yn) almost surely, signifying that such a Pn provides a 
zero variance estimator for a„. 



2.1 Popular performance measures 

Note that the relative width of the confidence interval obtained using the central limit 
theorem approximation is proportional to the ratio of the standard deviation of the es- 
timator divided by its mean. Therefore, the latter is a good measure of efficiency of the 
estimator. Note that under naive simulation, when Yn = I{Bn) (For any set D, I{D) 
denotes its indicator), the standard deviation of each sample of simulation output equals 
^Jan{^ — Oin) SO that when divided by a„, the ratio increases to infinity as a„ — )• 0. 

Below we list some criteria that are popular in evaluating the efficacy of the proposed 
importance sampling estimator (see [3]). Here, Var{an) denotes the variance of the 
estimator a„ under the appropriate importance sampling measure. 

A given sequence of estimators (a^ : n > 1) for quantities (a„ : n > 1) is said 

• to be weakly efficient or asymptotically efficient if 



^JVar{an) ^ 
hm sup j3^ < oo 

n— ^-oo CUn 

for all e > 0; 

• to be strongly efficient or to have hounded relative error if 

^/Var{an) 
iimsup < oo; 

n— >oo CUn 

• to have asymptotically vanishing relative error if 

hm ^E«!g3 = 0. 

2.2 Literature review 

Recall that {Xi : i > 1) denote a sequence of independent, identically distributed light 
tailed random vectors taking values in SR'^. Let {Xj, . . . , Xf) denote the components of 
Xj, each taking value in SR. Let F[-) denote the distribution function of Xi. Denote the 
moment generating function oi F by Af(-), so that 

M{e) := E [e'M = i?[e^i^K^2Xf+...+e,xf]^ 

where 6 = {01,02, ■ ■ ■ , 0d) and for x,y € ^'^ the Euclidean inner product between them is 
denoted by 

x-y := xiyi + X2y2 H h Xdyd- 

The characteristic function (CF) of Xi is given by 

ip{0) := E le'-'M = ^[e^(^i^K''2X?+...+e,xf)] 

where t = \/— 1. In this paper we assume that the distribution of Xi is non-lattice, which 
means that \lp{0)\ < 1 for ah 0e'St'^ -{0}. 

Let A(0) := lnM{0) denote the cumulant generating function (CGF) oi Xi. We define 
to be the effective domain of M{0), that is 



@:=[0= {01,02, ...,0d)e ^''\A{0) < oo} . 



Throughout this article we assume that G 0°, the interior of 6. 

The large deviations rate function (see e.g., [H]) associated with Xi is defined as 

A*(x) = sup(0-x-A(e)). 

This can be seen to equal 6 ■ x — A{9) whenever there exists 6 £ Q^ such that A' (9) = x. 
(Here, A' denotes the gradient of A). Now consider the problem of estimating P{Xn G -^)- 
Let dFe{x) = exp{6 ■ x — A(6))dF{x) denote the exponentially twisted distribution associ- 
ated with F when the twisting parameter equals 9. Let xq denote the argmin^g_4 A*(j;). 
Furthermore, let 9* G Q^ solve the equation A' (9) = xq. Under the assumption that such 
a 9* exists, [26j propose an importance sampling measure under which each Xi is iid with 
the new distribution function Fg* . Then, they prove that under this importance sampling 
measure, when A is convex, the resulting estimator of P{Xn G .4) is weakly efficient. See 
[3] and [T9] for a sense in which this distribution approximates the zero variance estimator 
for P{Xn G A). Since, A'(^*) = xq, it is easy to see that under the exponentially twisted 
distribution Fg*, each Xi has mean xq. 

As mentioned in the introduction, [5] consider a variant importance sampling measure 
where the distribution of Xj depends on the generated {Xi, . . . ,Xj-i). Modulo some 
boundary conditions, they choose an exponentially twisted distribution to generate Xj 
so that its mean under the new distribution equals n-)+i (""^o ~ X]i=i -^i)- They prove 
that the resulting estimator is strongly efficient under the restriction that A is convex 
and has a twice continuously differentiable boundary. Later in Section 5, we compare the 
performance of the proposed algorithm to the one based on exponential twisting developed 
by [26j as well as with that proposed by [5]. 

3 Efficient estimation of probabiUty density function of Xn 

In this section we first develop a saddle point based representation for the probability 
density function (pdf) of Xn in Proposition [I] (see e.g., [1], [9] and I2I])- We then de- 
velop an approximation to the zero variance estimator for this pdf. Our main result is 
Theorem [T| where we prove that the proposed estimator has an asymptotically vanishing 
relative error. 

Some notation is needed in our analysis. Let 

SR^ ■={{xi,X2,...,Xd) £?li'^\ xi>0'ii = 1,2,... 4. 

Denote the Euclidean norm of x £ ^"^ by |x| := -y/ii^j;. For a square matrix A, det(^) 
will denote the determinant of A, while norm of A is denoted by 

1 1^1 1 := max \Ax\ . 
N=i' 

Let A"{9) denote the Hessian of A{9) for 9 G G*^. Whenever, this is strictly positive 
definite, let A{9) be the inverse of the unique square root of A" (9). 

Proposition 1. Suppose A" (9) is strictly positive definite for some 9 G Q^. Furthermore, 
suppose that \(p\'^ is integrable for some 7 > 1. Then fn, the density function of Xn, exists 
for all n > J and its value at any point xq is given by: 

V27r/ ^/det{A"{9)) Jva-Rd 



where 
and 



i^iy, 0, n) = exp [n x r]{y, 0)] 
rj{y, 9) = ^y'A"{e)y + A{e + iy)-{e + ,y) ■ xq - A(0) + 9 ■ xq. (2) 



Proof. 

fnixo) = (:^) I Mj^jLt)e~'^'-^°Ut [M^^ is the MGF ofX„] (3) 

— 1 / M" ( - I e^'(*-''«) dt \My written in terms of M] 



d 



27 

—Y [ M"(ts)e-"^(^-^«) ds [substituting s = -] 
271"/ Jse^d n 



/ n \d fSi+Loo 1-62+1.00 re^+ico 

{^) / / ••• / e"[^(^)--"]rfsid.2---dsd (4) 



VZTTi/ ygj^— too J92—LOO J9j: — i<X) 

{^Y I exp [n {A(0 + .y) - (^ + .y) • xq}] (O't??/ 



2^j exp[n{A(e)-^-xo}] / ^ V(y, ^,") x exp <^ -n-y*A"(0)y [> dy 



d 

n\2 



Vexp[n{A(0)-0-2;o}] / ^(""^li', ^,n) x (/.(A(0)-iu;) dit; (5) 
271"/ J«;e5Rrf 

n \ f exp [n {A(0) - 61 • xo}] /" ,. -^ a(q\ q \ j.r \ ^ ra\ 

^ /^ w,„,^.x / i^in 2A{9)v,9,n)x4>{v)dv, 6 

27r/ y^det(A"(6')) y,;e5R<* 

where the equahty in ([3]), which holds for all n > 7, is the inversion formula applied to the 
characteristic function of X„ (see e.g, p^)- The assumption that {(pp is integrable ensures 
that I M (—)!", which is the characteristic function of X„, is an integrable function of t for 
all n > 7. The equality in Q holds, by Cauchy's theorem, for any 9 = {9i,92, ■ ■ ■ , 9^) in 
the interior of 0. The substitution y = n~^w gives ([5|, while (rol) follows from ([5|) by the 
substitution w = A{9)v. D 

For a given xq G K*^, xq / EXi, suppose that the solution 9* to the equation A'(^) = xq 
exists and 9* G &^. Then, the expansion of the integral in (nj) is available. For example, 
the following is well-known: 

Proposition 2. Suppose A"{9*) is strictly positive definite and \(pp is integrable for some 
7 > 1. Then, 

f i;{n-^A{9*)v,9*,n)x(l){v)dv = l + o(^]. (7) 

A proof of Proposition ^ can be found in [21] (see also [H]). For completeness we 
include a proof in the Appendix. It is also useful in following proof of Proposition [3j The 
proof uses the estimates (|32[) , (|33|) , ( 34 ) and Lemma [l] developed later in this section. 



3.1 Monte Carlo estimation 

The integral in M may be estimated via Monte Carlo simulation. In particular, this 
integral may be re-expressed as 

i;{n-^A{e*)v,e*,n)^g{v)dv, 

where 5 is a density supported on ^'^. Now if Vi, V2, . . . , Vn are iid with distribution given 
by the density g, then 



f r-\ - f '^\^ exp[n{A(r)-6'* ■ xp}] 1 ^ jjjn L.^^ ,.,,. ,.v-rvv ..x 



-:A{e*)V,,e*,n)cp{V,) 
V27ry ^det(A"(0*)) iV ^ g{Vi) 



is an unbiased estimator for fn{xo)- 

3.1.1 Approximating the zero variance estimator 

Note that to get a zero variance estimator for the above integral we need 

g{v) cc 'ip{n^^A{e*)v, 9*,n)(l){v) . 

We now argue that 

iP{n-^A{e*)v,e*,n)^l (9) 

for all V = o{nf>). We may then select an IS density g that is asymptotically similar to (j) 
for V = o(n6). In the further tails, we allow g to have fatter power law tails. This ensures 
that large values of V in the simulation do not contribute substantially to the variance. 
Further analysis is needed to see ([9]). Note from the definition of r][v, 6), that 

7?(0, 6) = 0, ?/'(0, ^) = and r]"\v, 6) = {if A"' {6 + lv) (10) 

for all 6, while 

ry'(0,r) = (11) 

for the saddle point 6*. Here r/' , i]" and r]'" are the first, second and third derivatives 
of 1] w.r.t. V, with 9 held fixed. Note that while 77' and r]" are d-dimensional vector and 
dx d matrix respectively, 77'" (u, 9) is the array of numbers: (( g^J^Jg^^ (t;, 9)))i<ij^k<d- 

The following notation aids in dealing with such quantities: If A = {aijk)i<ij^k<d is a 
d X d X d array of numbers and u = {ui,U2, • • . , Ud) is a d-dimensional vector and i? is a 
d X d matrix then we use the notation 

AQu= ^ aijkUiUjUk 

^<i,j,k<.d 

and 

A-kB = {Cijk)l<iJ,k<d , 

where 

Cijk — / ^ OimnpOmiOnjOpk • 
m,n,p 



Following identity is evident: 

AQ{Bu) = {Ai.B)Qu. (12) 

Since, it follows from the three term Taylor series expansion and ( 10|11 ) above, that 

i>{n-'2A{0*)v,0*,n) =exp|nr/(n-5A(r)w,r)| = exp I ^A'" (o* + in-'^ A{e*)v) Q {iA{e*)v)\ , 

continuity of A'" in the neighborhood of 6* implies ([9|. 

3.1.2 Proposed importance sampling density 



We now define the form of the IS density g. We first show its parametric structure and 
then specify how the parameters are chosen to achieve asymptotically vanishing relative 
error. 

For a G (0, oo), b G (0, oo), and a £ (l,oo), set 



I 6 X (j)(v) when \v\ < a 
giv) = { c u I I \ 

R" when \v\>a. 



(13) 



Note that if we put 



p:-- 



g{v) dv = h 



\v\<a 



where 



IG(w,x) 



|D|<a 
1 



(v) dv = bx IG 



e-*r-^ dt 



d a^ 
2' 2 " 



r(a;) ,0 
is the incomplete Gamma integral (or the Gamma distribution function, see e.g, |21j). 

(l-p) 



then 



C 



r dv 

J\v\>a \v\°' 



>0, 



provided p < 1. 





/ 

/ _ 
/ y^ 
1 / 
1 / 
1 / 


\ 
\ 

X \ 
\ \ 






1/ 0.3 








11 
// 
// 
// 


\v 
\\ 






// 

/ 

/ 0.2 


\\ 

\ 






/ 

/ 
/ 

/ 
/ 
/ 

/ / 01 


\ 
\ 
\ 

\ 
\ 
\ \ 






Jy 


\ \ 






1 , , , 








Figure 1: Dotted curve is the normal density function, while solid line is the density of 
the proposed IS density. 

The following Assumption is important for coming up with the parameters of the 
proposed IS density. 



Assumption 1. There exist ao > 1 and 7 > 1 such that 

\u\°"' \<fiu)\"' du < 00 . 

By Riemann-Lebesgue lemma, if the probability distribution of Xi is given by a density 
function, then |(/5(u)| — )• as |n| — )• 00. Assumption [I] is easily seen to hold when |(/7(u)| 
decays as a power law as \u\ — t- 00. This is true, for example, for Gamma distributed 
random variables. More generally, this holds when the underlying density has integrable 
higher derivatives (see |14j): If fc-th order derivative of the underlying density is integrable 
then for any ao, Assumption 1 holds with 7 > ^"° . 

To specify the parameters of the IS density we need further analysis. 

Define 



ipg{u) := Eg 



^i.u-(Xi-xo) 



e 



M{e) 

where Eg denotes the expectation operator under the distribution Eg. Let 

h{x) := 1 - sup \ipe*{u)\'^. (14) 

\u\>x 

Then < h{x) < 1, /i(0) = 0, h{x) is continuous, non-decreasing and h{x) t 1 as x | 0. 
Further, since 99 is the characteristic function of a non-lattice distribution, h{x) > if 
a; > 0. We define 

hi{y) = min{z | h[z) > y} for y G (0, 1). 

Then for any y G (0, 1) we have h{hi{y)) > y and hi{z) | as 2; | 0. 
Let {sn}^=i be any sequence with following three properties: 

1. Sn i as n — ;■ 00 

2. For any (3 positive, (1 — Sn)""n^ — )• as n — )• 00 

3. v/n/ii(s„) — )• 00 as n — )• 00 



Later in Section 5 we discuss how such a sequence may be selected in practice. Set 
^3(71) := hi{sn)- Then, it follows that if x > Ss{n) then h{x) > s„. Equivalently, 
|(/96c(n)| < y/1 — Sn for all u > S^^n). 

Let Kjnin and Kmax denote the minimum and maximum eigenvalue of A" (6*), respec- 
tively. Hence ■^^—— is the maximum eigenvalue of A"{9*)~^ = A{6*)A{6*). Therefore, we 
have 

-^ = \\A{e*)\\\ 

Next, put 52{n) = ^ Kmax^sin) ■ Then, ^/n52{n) — )• 00 and \v\ > 62(11,) implies 
\A{e*)v\ >S3{n). Also let 

^iH = / - 62{n) = W^^^^^(53(n), 

SO that \v\ < 62{n) implies |A(0*)i;| < (5i(n). 

Now we are in position to specify the parameters for the proposed IS density. Set 

a = ao 

and 

n62{n). 



Let Pn = bn X IG ( 2 ) ^ ) • For (7 to be a valid density function, we need pn < i- Since 



JQ I d a„ 



2' 2 



1 , select bn to be a sequence of positive real numbers that converge to 1 in 



such a way that bn < 1/IG 



d K 
2' 2 



lim -p 



and 



d-\-a. 

1 - s„rn 2 



1 - 6„ X IG 



2' 2 



0. 



(15) 



For example, 6„ = 1 — n~^ for any ^ > satisfies (15). For each n, let Qn denote the pdf 
of the form ( 13) with parameters a, a„ and 6„, chosen as above. Let En and Vavn denote 
the expectation and variance, respectively, w.r.t. the density gn- 

Theorem 1. Suppose Assumption\^holds and 9* G Q^. Then, 



En 



V^2(n-2A(r)y,(9*,n)(/)2(y) 

So 



7p^{n-2A{e*)v,9*,n)(p'^{v) 



eSR'' 



gn{v) 



dv = 1 + o{n 2 ) . 



Consequently, from Proposition^ it follows that 

'i^{n-lA{e*)Vi,d\n)4>{Vi) 



Varn 



gniVi) 



as n —)• 00 , 



so that the proposed estimators for {fn{xo) : n > 1) have an asymptotically vanishing 
relative error. 



We will use the following lemma from |14) . 
Lemma 1. For any A, /3 G C, 

|exp(A)-l-/3|< f|A-/3| + ^ 



exp(u;) for all co > max{|A|, |/3|} . 

Also note that from the definitions of tp and r] it follows that, for any 9 £ Q, 

—\ip{n 2A{9)v,9,n) 

is a characteristic function. To see this, observe that 



C V • V ) _ 1 

exp-^ —>ip{n 2A{9)v,9, 



n] 



exp ■ 



Eq 



V ■ V 



+ r/ ( n 2 



'2A{9)v,9]\ 



^i.n'^A(e)v-{Xi~xo) 



'2A{e)v) 



(fe in 2 

Some more observations are useful for proving Theorem [T| 

Since r]'" is continuous, it follows from the three term Taylor series expansion, 

r]{v, 9) = 7?(0, 9) + r/'(0, 9)v + \{vfri"{^, 9)v + \ri"'{v, 9) Q v 

2 D 



(where v is between v and the origin) and ( |10[ ) and ( 11 ) above that there exists a sequence 
{cn} of positive numbers converging to zero so that 

1 3 

\v{v,0*) - ^v"'iO,9*)Qv\ < eniKmin)^\vf for \v\ < 6i{n), 



10 



or equivalently 



\rj{v, e*) - ^A'"(r) {iv)\ < enif^min)hvf for \v\ < 6i{n) . 

Furthermore, for n sufficiently large, 

1 



3! 



A"'{e*) (iv) 



^ ^ f^m.in ^ 



and 



\r]{v,6*)\ < -Kmin\v 



(16) 



(17) 



(18) 



for all \v\ < 5i{n). We shall assume that n is sufficiently large so that (17) and (18) hold 
in the remaining analysis. 



Proof. ( Theorem [T]) 

We write 



/ 



^\n-2A{e*)v,e\n)(t)\v 



9n{v) 



dv = Is + h . 



Where 



and 



\v\<^S2{n) 9niv) 

ij'^{n~^A{9*)v,e*,n)(j)'^{v) 



\v\>^S2{n) 



9n[v) 



dv 



dv. 



From (13) we get 



and 



For any c > 0, put 



1 



Is = — / i;'{n-2A{e*)v, e*,n)^{v) dv 



1 



Gn 



n J\v\>y^52{n) 



v\°'ij\n-'2A{e*)v,e*,n)(j)^{v)dv 



*,M,^/ ,(.).» (^/G(^.f) 



By triangle inequality we have 



|/:^-l| < 



^d{VnS2{n)) 



bn 



+ 



^diV^hin)) 



1 



Since as n — )• oo we have ^d {\/n62{n)) — )• 1 and 6„ — )• 1, the second term in RHS converges 
to zero. Writing C3(^*) = A'"(0*) * A{6*), for the first term we have 



h 



'^d{VnS2{n)) 



1 

bn 



|ti|<V^<52(n) 



|i'|<v^<52(n) 



iiP'^{n-^A{9*)v, 9\n)- l| (P{v) dv 



iP'^{n-'2A{6*)v, 6*,n)-l- %^ (iv) } (P{v) dv 



3\/n 



< 



1 1 

bn (27r)3 J\v\<^52{n) 
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tP\n-^A{e*)v,e*,n) - 1 - ^^^ (tv) 



e 2 dv. 



We apply Lemma (II]) with 



A" 



A = 2n X ?7 n~^iA{9*)v, d* and /3 = n ^ — - Ln~^A{9*)v 



Since '-^ = -P{v), where P is a homogeneous polynomial whose coefficients does not 
dependent on n, and \v\ < \/n62{n) implies \n~2A{9*)v\ < 5i{n), we have from (18), (17) 



and (16), respectively 



\X\ = 2n\7](n--2Ai9*)v,9*)\ < 2n^K^in\n-^2 Ai9*)v\^ < ^Krmn\\A{9*)\\''\v 



4 ' 



1 



3! 



|/?| = 2n 
and 

|A-/3| =2n 



A'"{9*)Q [Ln-^A{9*)v 



< 2nlKmin\n--2A{9*)v\^ < ^K™,„||^(r)||>|2 



T] (n--2A{9*)v,9A - ^A"'(r) (Ln-'^A{9*)v 



<2ntn{nmin)^\n 2A{9*)vf < 



*^„,|3 ^ 2e„|z;|3 



From Lemma [T| it now follows that the integrand in the last integral is dominated by 
't;n /2e^^l ,.\ r |t;n / \v\'' \ ( 2en\v\^ 1 



+ -PW. 



n 



Therefore we have 1^ = 1 + o{n 2 ). 
Also 

1 



1 



exp{-\v\^}iP^{n'2A{9*)v,9*,n) 



dv 



< 



{2'KYCn J\v\>^&2{n) 



ipg* {n~2A{9*)v 



2n 



dv 



{2TTYCn leR 



ife* [n'2A{9*)v 



dv 



7 d + g 



{i-snr-in^,/ww)\ 



(2vr)rfC„ 



\A{9*)-'u\''\^e*{u)y du 



7 d+Q: 



< D, 



< D 



1 - Sr,.r 2n 2 



7 d+o: 



(1 Snr 277, 2 J|^|>^52(n) |d| 
(1-Pn) 






|n|" |c^5i. (u)!'*' dn . 



weK 



where -Di is a constant independent of n. By Assumption [T| the above integral over u is 
finite. For large n we also have 



dv f dv 



' \v\>y/n52{n) \V\ J|d|>1 I'^I 

By choice of 6„ we can conclude that 14 — t- as n — )• 00, proving Theorem [T} 



D 
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4 Efficient Estimation of Tail Probability 

In this section we consider the problem of efficient estimation of P{Xn G A) for sets A 
that are affine transformations of the non-negative orthants 3f?!^ along with some minor 
variations. As in ([6j), dominating point of the set A plays a crucial role in our analysis. 
As is well known, a point xq is called a dominating point of A if xq uniquely satisfies the 
following properties (see e.g, [Hj, [B]): 

1. xo is in the boundary of A. 

2. There exists a unique 9* G Ik'^ with K'{6*) = xq. 

3. ^C {x\e* -{x-xq) > 0}. 

As is apparent from (j22], |26j . |6j), in many cases a general set A may be partitioned 
into finitely many sets {Ai : i < m) each having its own dominating point. From simu- 
lation viewpoint, one way to estimate P{Xn £ A) then is to estimate each P{Xn ^ Ai) 
separately with an appropriate algorithm. In the remaining paper, we assume the exis- 
tence of a dominating point xq for A. 

Our estimation relies on a saddle-point representation of P{Xn G A) obtained using 
Parseval's relation. Let 

Yn := \/n{Xn - Xq) 



and 

where xq = {xq, Xq, . . . , Xq) is an arbitrarily chosen point in k"^. Let /in,6»,xo (y) be the 
density function of Yn when each Xi has distribution function Fq, where, recall that 

dFg{x) = exp{e • x)M{e)~^dF{x) = exp{^ • x - A{e)}dF{x) . 

An exact expression for the tail probability is given by: 

P[Xn gA]= P[Yn G An,.,] = e-"i^-o-^(^)} I e-^^'-y^K,B,.M dy (19) 



which holds for any ^ G and any xq G K . The representation (19) is not very useful 
without further restriction on xq and 6 (see e.g., p2]). Again, assuming that a solution 
6* G G*^ to A'(^) = xo exists, where xq is the dominating point of A, define 

c(n,0*,xo)= / exp{— -v/n(6'* • y)} dy = n2 / exp{— n(0* • lu)} dw 

Jy&An,x, Jwe{A-xo) 

We need the following assumption: 
Assumption 2. Vn, c{n,6* ,xq) < oo. 

Since xq is a dominating point of A, for any y G An,xoj 'we have 6* • y > 0. Hence, if 
^ is a set with finite Lebesgue measure then c{n, 9*,xo) is finite. Assumption p] may hold 
even when A has infinite Lebesgue measure, as Example [T] below illustrates. 

When Assumption ^ holds, we can rewrite the right hand side of (19) as 

c(n,r,xo)e-"^^*-^o-^(^*)> [ rn,e*,.oiy)Ke*,.oiy)dy (20) 

JyGAn,XQ 
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where 



is a density in ^'^. 



' n,9* ,X() 



(y) 



c(n,e",xo) waen y t yin,xo 

otherwise 



(21) 



Let Pn,e*,xo{t) denote the complex conjugate of the characteristic function oirnfi*^xo{y)- 
Since the characteristic function of h{n,9* ,xo) equals 



-ity/nxQ 



M{e*) 



by Parseval's relation, (20) is equal to 

d 



c(n,r,i„)e-»l''--"-Wl)(£) / ,p„,».,,.(t)e- 



2vry Jt&Rd 



ity/nxQ 



M{e*) 



dt. (22) 



This in turn, by the change of variable t = A(9*)v and rearrangement of terms, equals 
c(n,r,xo)e-'"^^*-^o-^(''*» / 1 



V^det(A"(^*)) 



27r 



i)e5R<* 



Pn,e',xo{A{e*)v)i^{n-2A{e*)v, e*,n)4>{v) dv. 



(23) 



We need another assumption to facilitate analysis: 
Assumption 3. For all t G SR'^, 

lim Pn,9*,xoit) = 1- 

n— >co 

Proposition 3. Suppose A has a dominating point xq , the associated 9* G 0° and A" {9* 

is strictly positive definite. Further, Assumptions^ and^ hold. Then, 

d 



P[Xn G A] 



1 Y c(n,r,2;o)e-"{^*-^o-^(^*» 



27r 



^det(A"(0*)) 



or, equivalently by (23) 



lim / pnfi*,xM{G*»i'{n-2A{9*)v,9\n)(l){v)dv = l. 



(24) 



(25) 



Proof of Proposition [3] is omitted. It follows along the line of proof of Proposition [2] 
and from noting that: 



lim / pnfi*,xo{A{9*)v)4>{v)dv = 1, 

"-^OO Jy(Z^d 

lim / ViVjVkPn,e*,xo{A{0*)v)(l){v)dv = G. 

>d 



Let g be any density supported on JR . If Vi, V2, . . . , Vn are iid with distribution given 
by density g, then the unbiased estimator for P[Xn S A] is given by 



P[Xn G A] 



1 y c(n,r,xo)e-"^^*-^«-^(^*» 
2^ 



1 



AT 



^det(A"(e*)) 
Pn,e',xo{A{9*)VJ)^P{n-U{9*)Vj,9\n)c|){Vj) 



S~^ Pnfi* 



i=i 



5(^,; 



(26) 
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Note that for above estimator to be useful, one must be able to find closed form expression 



for c{n,9*,xo) and Pn,0*,xo{t) or these should be cheaply computable. In Section 4.1, we 
consider some examples where we explicitly compute c{n,6*,xo) and Pn,e*,xo ^^^ verify 
Assumptions [2] and [3| 

Theorem 2. Under Assumptions\l^ [^ and[^ 



Er: 



9liV) 



1 + o\n 2 j as n ^ oo, 



where Qn is same as Theorem^ Consequently, by Proposition^ it follows that 



as n — 7- oo 



Var„ 



P[Xn G A] 



and the proposed estimator has asymptotically vanishing relative error. 
The proof of Theorem [2] is given in the appendix. 

4.1 Examples 

Example 1. Let A = xq + K^, where xq = (xg, Xg, . . . , Xq) is a given point in JR'^. Further 
suppose that Vi = 1, 2, . . . , d, 6* > 0. It is easy to see that existence of such a 9* implies 
that Xq is a dominating point for A. It also follows that Assumption [2] holds and 

c{n,e*,xo) = —, . 



n2( 



l''2 



It can easily be verified that 



Pn,e- 



,xo{ti,t2,...td) = Yl ,f. 



Therefore Assumption [3] also holds in this case. By Proposition [3j we then have 

gn{A(r)-r-xo} 



P\Xn - xo G ^\\ ~ 



(27r)2n5y/det(A"(^*))0J 



2 



By Theorem [2j 



„n{A(e*)-e*-xo} 



(27r)2n2y/det(A"(0*))0* 



*/3* 



E 



V'(n-2A(r)yj,r,n)0(yj) 



9d ^^nti(i + 



tef^(e*)VS- 



/ne* 



(27) 
is an unbiased estimator for P[X„ — xq € 3?'J_] and has an asymptotically vanishing relative 
error. 



Example 2. For ^<d! <d, let 

g+ :={(xi,X2,...,Xd) GSJi'^l Xi >0 V 0<i<(i'}. 

Suppose we want to estimate P[X„ G ^], where, now ^ = xq + Q\i and xq is a given 
point in SR'^ (see Figure 2(b) ). We proceed as in Example [l| In this case Equation (19) is 



P[X ^A\ = e-"{^-o-A(e)} / e-v^(^-^)/i„,,,,„(y) dy 

JyeQ+, 



(28) 
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(a) yi = 10 + 5R^ 



(b) A^xo + Qt 




(c) A^xo + B^i. {d)A^xo + BQ+ 

Figure 2: A is shown as shaded region {d = 2). 



We now assume that 6* > 0, \/i < d' and 9* = 0\/i > d' 



Dividing the right hand side of equation ( 28 ) by ^/nOl , \fnQ\ , ■ ■ ■ , V^^d' ^ and inte- 
grating out yd' +1, yd' +2, . . . , yd we obtain 



,n{A(e*)-r-zo} 



n 2 ( 



l'^2 



d' 



„ Jy,>OVi<d' \-^^ 



YlV^e^e-^'^^y' 



d' 



'<i<d 



hn,e*,xo{y) n ^y« n^^*' 



i=d'+l 



i=l 



which we can write as 

„n{A{0*)-6l*-a;o} 



n 2 



l"'2 



/ d' \ d' 

n V^ne--^''^"^' K0*,xoiyi,y2, ..-, yd') H dy^, 



where /in,6i*,xo(yi) 2/2, • ■ ■ , ?/d') is the density function of {Y^^Y"^ , . . . , Y'^ ) under the mea- 
sure induced by Fq*. Thus, the problem reduces to that in Example [I] with dimension d' 
instead of d. In this case, 

c{n,e*,xo) = -—, 



n 2 



l"'2 



and 



p{n, 
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Thus, both the Assumptions [2] and [3] hold and we have 

pn{A(r)-e*-x'o} 
P[Xn G A] 



(27rn) T ^det{A"{e*))eie^ • • • 0*, 
Furthermore, the associated estimator has an asymptotically vanishing relative error. 



Example 3. When A = xq + B^'^ and B a nonsingular matrix (see Figure 2(c)), the 
problem can also be reduced to that considered in Example [T] by a simple change of 
variable. Set y = B~^z. Then, it follows that for any 6 

c{n,e,xo) = det{B) / exp{- ^/n{B'^ 9 ■ z)} dz . 

Now if we assume that all the d components of BO* are positive, then as in Example [l| 
both the Assumptions [2] and |3] hold. 

Similar analysis holds when A = xq + BQ'^,, {1 < d' < d), and B a nonsingular matrix. 
Then, simple change of variable y = B~^z reduces the problem to that in Example [2| 





(a) (b) 

Figure 3: Set A^'^ is the region labeled i {i = I, 2, 3, A'-^^ C A^^^ C A^^h) 



Example 4. In above examples we have considered sets A which are unbounded. In this 
example we show that similar analysis holds when the set A is bounded. Consider the 



three increasing regions {Ai : i = 1, 2, 3) as depicted in Figure 3(a) Here ^3 corresponds 
to region A considered in Example 1 . xq is the common dominating point for all the three 
sets. Again suppose that Vi = 1, 2, . . . , d, 9* > 0. Suppressing dependence on xq and 9* , 



for i = 1,2, let 



.W .- 



ye^/n{A^^)~xo) 



exp{-^/n{9* ■y)}dy 



and 



fi>it) :-- 



1 



exp{-if • y - y/n{9* ■ y)] dy. 
If A^^' is the d-dimensional rectangle given by Yii [^^O' ^0 + ^i\ then 



.(1) 



^1 _ e-"^i'^i)(l - e""^2^2) . . . (1 _ e-"-'^*d^'i) 



n^t 



1"2 
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Therefore, it folfows that Assumption |3| holds for A^^'. Also note that, 

\£\t) - 1| < ^ / exp{-MO* ■ y)} \e-''-y - 1| dy 



< -^. [ exp{-r • z} 

n2Cn -'zen(^(2)-xo) 

< -^ [ expl-r-z} 

n2c„ JzeKj 



U-z 



e ^ — 1 



dz 



Lt-Z 



e V" — 1 



dz. 



Since the last integral converges to zero, it follows that Assumption [sj holds for A^"^'. 
Similar analysis carries over to sets as illustrated by Figure |3(b)| under the conditions as 
in Example 3. 

In Example 1 we assumed that Vi = 1,2, . . . ,d, 9* > 0. In many setting, this may 
not be true but the problem can be easily transformed to be amenable to the proposed 
algorithms. We illustrate this through the following example. Essentially, in many cases 
where such a 9* does not exist, the problem can be transformed to a finite collection of 
subproblems, each of which may then be solved using the proposed methods. 

Example 5. Let (Xj : i > 1) be a sequence of independent rv's with distribution same 
as X = (Zi, Z2), where Zi and Z2 are standard normal rvs with correlation p. Suppose 
A := (a, 6) + K+, that is A := {{zi,Z2)\zi > a and Z2 >b}. Solving A'(6'i, 6*2) = (a, b) we 
get 

--"-^^ and 9;-^-''' 



' l-p2 — "^ l-p2 

Thus, if min{|, -} > /> we have both 0* and ^2 positive, and we are in situation of Example 
1 Suppose - < yO so that ^2 < 0- Then making the change of variable Z3 = —Z2 we have 

P[Zi >a,Z2>b] = P[Zi > o] - P[Zi >a,Z3> -b]. 

Now for estimating the second probability we have both 0^ and 62 positive. Similarly, the 
first probability is easily estimated using the proposed algorithm. 

However, note that if (a, b) lies on {{zi, Z2)\z\ = pz2 or Z2 = pzi} we have one of 9\ or 
^2 zero, and consequently c{n,9\,92,a,b) is infinite. The proposed algorithms may need 
to be modified to handle such situations, however its not clear if simple adjustment to our 
algorithm will result in the asymptotically vanishing relative error property. We further 
discuss restrictions to our approach in Section 6. 

4.2 Estimating expected overshoot 

The methodology developed previously to estimate the tail probability P{Xn € A) can 
be extended to estimate E[X^ \Xn G A] for a G (Z+ — {0})'^. We illustrate this in a single 
dimension setting (d = 1) for a = 1, and A = (xq, 00) for xq > EXi. 

Let Sn = ^27=1 -^i- ^^ finance and in insurance one is often interested in estimating 
E[{Sn—nxo)\Sn > nxo], which is known as the expected overshoot or the peak over thresh- 
old. As we have an efficient estimator for P(X„ > 2:0)) the problem of efficiently estimating 
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E[Sn\Sn > nxo] is equivalent to that of efficiently estimating E[{Sn — nxQ)I{Sn > nxo)]. 
Note that 

E[{{Sn - nxo)I{Sn > nxo)] = V^E[YnI{Yn > 0)], 



where Yn = \/n{Xn — xq). Using (19) we get 



E[y„/(y„ > 0)] = e-"^'''-^"-^^'''^ / ye-v^(^*-^)V,.,,„(y)^2/, (29) 



where recall that 0* G is a solution to A'(0) = xq and hn^e*,xoiy) is the density of Yn 
when each Xi has distribution Fg* . Define 



oo 

*2n-1 



c{n,e*)= / yexp{-^/n{9*-y)}dy = {ne*^) 
Jo 



Hence, Vn, c{n,9*) < cxd. The right hand side of (29) may be re-expressed as 

c(n,r)e-"^^*-^o-^(^*)> / rn,e*{y)K,e*,.oiy)dy (30) 

Jo 

where, 

-^Kn wheny>0 ^3^^ 

otherwise 

is a density in 3f?+. 

Let Pn,e*{t) denote the complex conjugate of the characteristic function of rn,e*iy)- 
By simple calculations, it follows that 

1 

Pn,e*{t) 



1 _ _fi 2Lt 



and lim Pn,e*{t) = 1- Then, repeating the analysis for the tail probability, analogously to 



(23), we see that (30) equals 



r-——- I Pn,e*{A{e*)vmn-2A{e*)v,e*,n)ct>{v)dv. 

Y^27rA"(6'*) Jo 

As in Proposition [3j we can see that 

n \5 c(n,r)e-''^^*-^o-^(^*)> f I \^ g-n{r.xo-A(r)} 



E[{Sn-nxo)I{Sn > nxo)] 



^2ttJ y/det{A"{e*)) \2TTnJ e*^y/det{A"{9*))' 

so that 

E[{Sn - nxo)I{Sn > nxp)] J_ 

P[Sn > nxo] ^ 0* ■ 

Using analysis identical to that in Theorem [2j it follows that the resulting unbiased 
estimator of E[{Sn — nxo)I{Sn > nxo)] (when density Qn is used) has an asymptotically 
vanishing relative error. 

The above analysis can be easily extended to prove similar results for the case of 
Xi £ K'^ and a a vector of positive integers. 
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5 Numerical Experiments 

5.1 Choice of parameters of IS density 

To implement the proposed method, the user must first specify the parameters of the IS 
density gn appropriately. In this subsection we indicate how this may be done in practice. 
All the user needs is to identify a sequence {sn}^=i satisfying the three properties listed 



in Subsection 3.1.2 Once {sn}'^=i is specified, arriving at appropriate a, an, and bn is 
straightforward (see discussion before Theorem [ij Finding A{6*), Umax and Kmin are one 
time computations and can be efficiently done using MATLAB or MATHEMATICA). 

Clearly for any e E (0, 1), Sn := ^ satisfies properties 1 and 2. To see that property 
3 also holds, note that 



l-|(pe*(i)r= / (l-cos(t-x))dFe.(x), 

where Fq* {x) is the symmetrization of Fq* (x) (if G is the distribution function of random 
vector Y then symmetrization of G, denoted G, is the distribution function of the random 
vector Y + Z, where Z has same distribution as —Y). Since 

- — r- - - — r^ < 1 - cos(t • x) < ^ — -^, 
2! 4! ~ ^ ^ ~ 2! 

it follows that there exist a neighborhood U C ^'^ of origin and positive constants c and 
C, such that 

c\tf < l-|(^e.(t)|2 <C7|t|2 

for all t G [/. This in turn implies that there is a neighborhood V G ^ oi zero and positive 
constants c, C, ci and Ci such that 

cx^ < h{x) < Gx^ 

and 



civx < hi{x) < Cwfx 

for all X G y. Therefore ^Jnh\(^Sn) = \/nhi{n^^) > cn2~2 — )• oo for any e < 1. 

One may choose e close to 1 so that \/nhi{sn) grows slowly. Then, since a„ = 
^/n52{n) = y/ K,max^/nhi{sn) , CLn Can be taken approximately a constant over a specified 
range of variation of n. Also since p„ = bn x IG ( 2' ^ ) i^ what one uses for simulating 
from Qn, and p^ t 1) in practice for reasonable values of n, one may take pn as a constant 
close to 1. In our numerical experiment below, parameters for gn are chosen using these 
simple guidelines. 

5.2 Estimation of probability density function of X„ 

We first use the proposed method to estimate the probability density function of X„ for 
the case where sequence of random variables {Xi : i > 1) are independent and identi- 
cally exponentially distributed with mean 1. Then the sum has a known gamma density 
function facilitating comparison of the estimated value to the true value. The density 
function estimates using the proposed method (referred to as SP-IS method) are evalu- 
ated for n = 30, a„ = 2, q = 2 and pn = 0.9 (the algorithm performance was observed 
to be relatively insensitive to small perturbations in these values) based on A^ generated 
samples. Table [T] shows the comparison of our method with the conditional Monte Carlo 
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(CMC) method proposed in Asmussen and Glynn (2008) (pg. 145-146) for estimating 
the density function of Xn at a few values. As discussed in Asmussen and Glynn (2008), 
the CMC estimates are given by an average of A^ independent samples of nf{x — Sn-i), 
where Sn-i is generated by sampling {Xi, . . . , Xn-i) using their original density function 
/. Figure |4] shows this comparison graphically over a wider range of density function val- 
ues. As may be expected, the proposed method provides an estimator with much smaller 
variance compared to the CMC method. 



X 


True value 


SP-IS 


Sample 


CMC 


Sample 






estimate 


variance 


estimate 


variance 


1.0 


2.179 


2.185 


0.431 


2.360 


31.387 


1.5 


0.085 


0.087 


4.946 xlO""^ 


0.067 


0.478 


2.0 


1.094 xlO"^ 


1.105 X 10"^ 


1.066 X 10-9 


7.342 X lO"'^ 


3.341 X 10-1 



Table 1: True density function and its estimates using the proposed (SP-IS) method 
and the conditional Monte Carlo (CMC) for an average of 30 independent exponentially 
distributed mean = 1 random variables. For x = 1.0 and 1.5, the number of generated 
samples A^ = 1000 in both the methods, and for x = 2.0, N = 10, 000. 




0.9 


0.92 


0.94 


0.96 


0.98 


1.2 
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Figure 4: True density function and its estimates using the proposed (SP-IS) method 
and the conditional Monte Carlo (CMC) for an average of 30 independent exponentially 
distributed mean = 1 random variables. This plot illustrates the performance of the two 
methods over wide range of x values. In both simulations N = 1,000 at each point. 

5.3 Comparison with independent exponential twisting approach 

We consider a simple numerical experiment in dimension d = 3 to compare efficiency of the 
proposed method with the one involving state independent exponential twisting proposed 
by Sadowsky and Bucklew (1990). We consider a sequence of random vectors {Xi, Yi, Zi : 
i > 1) that are independent and identically distributed as follows: Let Ei,E2,E3 be iid 
exponentially distributed with mean 1. Define rvs X, Y and Z as 

X = ^{E,+ E2) 
Y=l{E2 + E-i) 
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1 



(^3 + ^l) 



Each (Xi, Yi, Zi) for i = 1, 2, . . . , n has the same distribution as (X, y, Z). We estimate 
the probabiUty P{Xn ^ x,Yn > y, Zn > z) for x = 1.4, y = 1.5 and z = 1.4 and different 
values of n. Table [2] below reports the estimates based on A^ generated samples. c„ denotes 
the exact asymptotic (the saddle point estimate) corresponding to the probability. These 
differ substantially from the estimated probability values, emphasizing the inaccuracy of 
Cn even for reasonably large values of n, and thus motivating simulation as a tool for 
accurate estimation of the associated rare probabilities. 

In these experiments we set a^ = 2, a = 3 and pn = 0.95. We also report the variance 
reduction achieved by the proposed method over the one proposed by Sadowsky and 
Bucklew (1990). This is substantial and it increases with increasing n. 

Table 2: Comparison of the proposed methodology (SP-IS) with optimal state inde- 
pendent exponential twisting (OET). In second and third columns we report the 95% 
confidence intervals for the tail probability under SP-IS and OET respectively. 



n=10 






c„ = 0.0122562 


N 


OET 


SP-IS 


Variance reduction 


1000 


(2.391 ± 0.494) X lO^^ 


(2.492 ±0.211) X 10-3 


5.48 


10000 


(2.546 ±0.163) X lO'^ 


(2.478 ± 0.073) x 10-^ 


4.98 


100000 


(2.503 ±0.05) X 10-3 


(2.479 ± 0.024) x 10-^ 


4.34 


n=20 






Cn = 4.490 X 10-4 


N 


OET 


SP-IS 


Variance reduction 


1000 


(1.621 ±0.373) X 10-4 


(1.383 ±0.102) X 10-4 


13.37 


10000 


(1.507 ±0.118) X 10-4 


(1.513 ±0.029) X 10-4 


16.55 


100000 


(1.506 ±0.037) X 10-4 


(1.474 ±0.009) X 10-4 


16.90 


n=40 






Cn = 1.704 X 10-6 


N 


OET 


SP-IS 


Variance reduction 


1000 


(7.349 ±2.346) x 10"^ 


(8.309 ± 0.364) x 10"^ 


41.53 


10000 


(7.77 ±0.757) X 10"^ 


(8.186 ±0.115) X 10-^ 


43.33 


100000 


(8.039 ±0.255) x 10"^ 


(8.181 ±0.037) X 10-^ 


47.50 


n=60 






Cn = 9.960 X lO-'J 


N 


OET 


SP-IS 


Variance reduction 


1000 


(5.411 ± 2.051) X 10-'^ 


(5.869 ± 0.257) x 10-^ 


63.69 


10000 


(5.734 ±0.668) x 10-^ 


(5.632 ±0.071) X 10-y 


88.52 


100000 


(5.666 ±0.214) X 10"^ 


(5.651 ±0.023) X 10-9 


86.57 


n=80 






Cn = 6.946 X 10-11 


N 


OET 


SP-IS 


Variance reduction 


1000 


(4.101 ±1.664) X 10-11 


(4.337 ±0.181) X 10-11 


84.52 


10000 


(4.615 ±0.622) X lO-^i 


(4.401 ± 0.059) X 10-11 


111.14 


100000 


(4.343 ±0.187) X lO-^^ 


(4.381 ±0.018) X 10-11 


107.93 


n=100 






Cn = 5.336 X 10-13 


N 


OET 


SP-IS 


Variance reduction 


1000 


(3.676 ±1.478) x lO-^^ 


(3.618 ±0.146) X 10-13 


102.48 


10000 


(3.923 ±0.533) x lO-^^ 


(3.637 ±0.049) x lO-i^ 


118.32 


100000 


(3.546 ±0.172) X lO-^^ 


(3.609 ±0.016) X 10-13 


115.56 
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5.4 Comparison with state dependent exponential twisting 

We compare the efficiency of SP-IS method for estimating the tail probabiUty P{Xn G A) 
with the optimal state dependent exponential twisting method proposed by [3j (referred 
to as BGL method). They restrict their analysis to convex sets A with twice continu- 
ously differentiable boundary whereas SP-IS method is applicable to sets that are affine 
transformations of the non- negative orthants 3?!^. The two methods agree in the single 
dimension and hence we compare them on a single dimension example. 

For a sequence of random variables {Xi : z > 1) that are independent and identically 
exponentially distributed with mean 1, P{Xn ^ 1-5) is estimated for different values of n. 
Table [3] reports the estimates based on different N generated samples. In this experiment, 
o„ = 2,a = 2 and pn = 0.9 for SP-IS method. BGL method is implemented as per [5] 
as follows: first Xi is generated using an exponentially twisted distribution with mean 
xq = 1.5. At each next step, the exponential twisting coefficient in the distribution used 

to generate X^^i is recomputed such that mean of the distribution is ^^° ^^^ — -■ The 

exponential twisting is dynamically updated until the generated ^11=1 -^i — ^^0 ^^ which 
point we stop the importance sampling and sample rest oi n — k values with the original 
distribution. In the other case, if distance to the boundary nxo — Yli=i -^i is sufficiently 

large relative to remaining time horizon n — k ( — ° n^V^ — ~ — ^ ^0) ) then we generate the 

next n 



k samples with exponentially twisted distribution with mean "^° ^V^ — ^' 



n 


N 


True value 
(exact asymp- 


BGL 


CoV 


SP-IS 


CoV 


VR 


CT 


BGL 


SP- 






totic c„) 














IS 




10^ 




9.276x10-4 


1.41 


9.055x10-4 


0.32 


20.38 






50 


10^ 


9.039x10-4 


9.127x10-4 


1.41 


9.036x10-4 


0.32 


19.77 


7.5 


0.9 




10^ 


(9.992x10-4) 


9.036x10-4 


1.41 


9.038x10-4 


0.32 


19.13 








10^ 




5.936x10-6 


1.44 


5.932x10-6 


0.28 


25.84 






100 


10^ 


5.924x10-6 


5.913x10-6 


1.45 


5.923x10-6 


0.29 


24.54 


15.4 


0.9 




10^ 


(6.261x10-6) 


5.928x10-6 


1.44 


5.921x10-6 


0.29 


24.20 








10^ 




3.355x10-1° 


1.48 


3.378x10-1° 


0.28 


25.83 






200 


10^ 


3.371x10-10 


3.381x10-1° 


1.46 


3.368x10-10 


0.29 


26.17 


32.0 


0.9 




10^ 


(3.473x10-1°) 


3.370x10-1° 


1.46 


3.374x10-10 


0.28 


26.92 








10-^ 




2.169x10-14 


1.46 


2.180x10-14 


0.29 


26.48 






300 


10^ 


2.176x10-14 


2.180x10-14 


1.47 


2.175x10-14 


0.28 


27.76 


48.0 


0.9 




10^ 


(2.226x10-14) 


2.173x10-14 


1.47 


2.179x10-14 


0.28 


27.89 







Table 3: SP-IS method has a decreasing coefficient of variation (CoV) and it provides 
increasing variance reduction (VR) over the optimal state dependent exponential twisting 
(BGL) method. Computation time per sample (CT), reported in micro seconds, increases 
with n for BGL method whereas it remains constant for SP-IS method. 

In this example, the true value of tail probability for different values of n is calculated 
using approximation of gamma density function available in MATLAB. Variance reduction 
achieved by SP-IS method over BGL method is reported. This increases with increasing 
n. In addition, we note that the computation time per sample for BGL method increases 
with n whereas it remains constant for the SP-IS method. Table [3] shows that the exact 
asymptotic Cn can differ significantly from the estimated value of the probability. As 
shown in Table [2| this difference can be far more significant in multi-dimension settings. 
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Figures: A = {{x^ , x'^)\x^ > {x'^)'^ + a} . 



thus emphasizing the need for simulation despite the existence of asymptotics for the rare 
quantities considered. 



6 Conclusions and Direction for Further Research 

In this paper we considered the rare event problem of efficient estimation of the density 
function of the average of iid light tailed random vectors evaluated away from their mean, 
and the tail probability that this average takes a large deviation. In a single dimension 
setting we also considered the estimation problem of expected overshoot associated with 
a sum of iid random variables taking a large deviations. We used the well known saddle 
point representations for these performance measures and applied importance sampling to 
develop provably efficient unbiased estimation algorithms that significantly improve upon 
the performance of the existing algorithms in literature and are simple to implement. 

In this paper we combined rare event simulation with the classical theory of saddle 
point based approximations for tail events. We hope that this approach spurs research 
towards efficient estimation of much richer class of rare event problems where saddle point 
approximations are well known or are easily developed. 

Another direction that is important for further research involves relaxing Assumptions 
[2] or |3] in our analysis. Then, our IS estimators may not have asymptotically vanishing 
relative error but may have bounded relative error. We illustrate this briefly through a 
simple example below. Note that many intricate asymptotics developed by litis |18J for 
estimating P[Xn £ A] correspond to cases where Assumptions p] or ^ may not hold. 

Example 6. Let (Xi : i > 1) be a sequence of independent rv's with distribution same 
as X = (Zi, Z2), where Zi and Z2 are uncorrelated standard normal rvs. Suppose A := 
{{zi,Z2)\zi > Z2 + a} for some a > (see Figure pi). As xq we choose the point (a,0) 
which is clearly the dominating point of the set A. Now for any 9i > and 62 it can be 
shown that 

^exp{^} 



c{n, 01,02, a) = / ex.p{-^/n{0lyl + 02y2)}dyldy2 

Solving k' [01,02) = (a,0) gives 01 = a and 0^ = 0. Also 

Pnfi*,xo{t) = I ^ ,ti I exp 

\ ay/n / 



n0f 
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Therefore Assumption [3] fails to hold: 



lim Pn,e*,xo{t) =exp 



4a 



Therefore, in this case the the family of estimator given by (26) may not have asymp- 



totically vanishing relative error. But, nevertheless, it can be shown to have bounded 
relative error. To see this, note that 



eSR'' 



p^,,e*(.A{e*)v)^{v)dv 



1 + 



1 
2a 



and 



P,,,e*{A{9*)v)''(t>{v)dv= (1+^^ 



(Here A"{e) = (J°) for all 9. So A{9*) = (H).) Also VI <i,j,k<d 

ViVjVkPxo,e* {A{e*)v)(l){v) dv = 0= / ViVjVkPxo,e* {A{e*)vf(t>{v) dv. 
Therefore as in Proposition |3j it follows that 

e"^" / 1 \"2 

2V^V^a2 V 2a y 

Mimicking the proof of Theorem (pi) it can be established that 



Varr, 



P[Xn G A] 



1 + 2^ 



1 + 



A Proofs 

Proof, (of Proposition pi) 

Let C3(^*) = A"'ie*)ir A{9*). We have 



i-eR'* 



i){n-2A{9*)v,9\n)(p{v)dv - 1 



= 


/ 

r 


— 


Jve^d 


< 


1 




(27r)f 



{^(n-2A(r)i;,r,n) - l}(t){v)dv 



^{n-'2A{9*)v, r , n) - 1 - ^^^ Uv) 

6\/n 



V'(n~2A(r)t;,r,n)-l 
(/i + h) , 



C3(r) 

6-v/n 



Q{iv) 



(j){v)dv 
v)dv 



where 






J\n 


'^A{e*)v\<5 


exp In X 


J\n' 


'iA(e*)v\>S 


exp < n X 



n- 



A"'{9*) 
l3! 



(Ln~^A{9*)v) 



r]{n~^A{9*)v,9*)\ 



1-n— ^0(m~M(r)u 



exp 



exp 



dv, 



dv. 
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We now discuss how the S above may be selected. 

Since rj'" is continuous, it follows from the three term Taylor series expansion, 

r]{v, e) = 7?(0, 6) + r/'(0, e)v + \{vfri"{^, e)v + \ri'"{v, 9) Q v 

2 D 



(where v is between v and the origin) , ( |10[ ) and (11) that for any given e we can choose 
6 small enough so that 

1 



\v{v,e*)-^r]"\0,e*)Qv\<e{K^in)-^\vf for \v\ < 5, 



or equivalently 



\v{v,e*) - ^A"'{e*) {iv)\ < e{K„,in)l\vf for |^| < <5. 



Since 



3! 



A"'{e*) {lv) 



^ ^ ^m,in ^ 



and 



\r]{v,6*)\ < -Kmin\v\'^ 
o 



(32) 
(33) 
(34) 



for all \v\ sufficiently small, we choose 6 so that (33) and (34) also hold for |t;| < 5. 
We apply Lemma (II| with 

\ = nx 7] (n'2A{e*)v,e*] and /3 = n ^^ (Ln^2A{9*)v] . 



Since '-^ = ^P(f), where P is a homogeneous polynomial with coefficients independent 
of n and for \n~2A{6*)v\ < 5 we have from (34), (33) and (32), respectively. 



\X\=n\rj(n'^A{9*)v,e*]\ < n^Kmin\n-'^A{9*)v\^ < ^Kmin\\A{9*)\\M^ = ^-^ 



n 



^A"'i9*)Q[in-^2Ai9*)v^ 



< n-K„,in\n '2A{9*)vf < -Kmin\\A{9*)\\^\v\'^ 



and 

|A-/3| =n 



r]{n-^2A{9*)v,9*) - -^A"'{9*)q {Ln-^A{9*)v 



<ne{Kmin)^\n ^A{9*)v\^ < 



From Lemma (IT]) it now follows that the integrand in /i is dominated by 



v^ I / e\vr 1 „/ s \ \ V 

exp i — f' X ( — ^ H — P{v) ) X exp 



2^ r 3v'\fe\v\\l . 



n 



Since e is arbitrary we have /i = o{n 
Next we have 

/2 < / , 

J|n"2A(6»*)t)|>(5 



exp<; -— J> VC"- ^^A{9*)v,9*,n) 



dv 



\n~ 1 A(e*)v\>& 



\A{B*)v\>S^ 



1 + 



UO*)Qv 



6 



. [n-2A{9*)v 



exp<5 -— - } dv, 



dv+ I ( 1 + 

i\A{e*)v\>5^ 



C3{0*)&V 



6 



exp<; -y > t?^^- 
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Let 55 < 1 be such that l^jg* (f)| < qs for \v\ > S. Then we have 



h < 



n— 7 



d 



ip0, (n ■2A{9*)v 



dv+ I ( 1 + 

'\A{e*)v\>5^ 



(3(0*) Qv 



= q^-^n-^ y'l A" {0*)\ / \^e*{u)f'du+ (1 + 

Jve^'' J\A{e*)v\>s^ 

It follows that I2 = o{n~°') for any a. 



6 



6 



exp<i -y ^ dv, 



exp<5 -— } dv. 



Proof, (of Theorem ^ 

The proof follows along the same line as proof of Theorem [T] We write 



/'L*,xo(^(^*)^)V''(^"^^(^*)^,^*>^)'^'W 



V&^d 



9n{v) 



dv = h + le 



where 



/, 



pl,e^,:,MmvW{n"A{e*)v,e\n)(tP{v) 



\v\<S2(n)^/n 



9n{v) 



dv 



1 



D 



^n J\v\<52{n)y^ 



ple*.Minv)i^\n-2Aie*)v,9\n)^{v)dv. 



plg*^,,{A{9*)v)iPHn-^A{e*)v,e*,n)(P^{v) 



dv 



Now 








\h - 1| 





1 

bn 


/ 






-Z |f |<(52(n)v«^ 




< 


1 

K 


/ 




< 


1 


1 

J\v\<52{n)^ 




< 


1 
K 


1 

J\v\<52{n)^/n 




< 


1 
6^ 


1 

J\v\<52{n)y/n 



\v\>52{n)^ 9n[v) 

-^ f pl,,^,,^{A{9*)v)\vri;\n-'2A{d*)v,e*,n)^\v)dv. 



ple*,.Min^)^^i'^'~'MO*)v.O\n)<l^{v)dv -I 



ple',.M^e*)v) [i,\n-"^A{e*)v, e\n)-l] <P{v) dv + o(l) 
pl,e',.M^O*» U\n~"2A{0*)v, 9\n) - 1 - %^ (iv)] <t>[v) dv 



3vn 



Pn,e*,.M{n^)Y 



il)'^{n^2A{e*)v,e\n)-l 



^p'^{n-2A{0*)v,e*,n) -1 






Qiiv) 



(3(0*) 
v) dv + 0(1 



QUv) 



v) dv + 0(1) 



Now as in the case of Theorem 111 we conclude that Is = 1 + o{n 2 ). Also, since 

I/el < ^f \vr\pn,e*,xoiA{9*)v)\^ ij\n-"2A{e*)v,e*,n) (t)\v)dv 

^n J\A{e*)v\>52{n)^ 



< 



1 



we conclude that /e — )• as n — )• cxd proving the theorem. 



exp{-t;2}V^(n~iyl(r)t;,r,n) 



dv. 



n 
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